General Disclaimer 


One or more of the Following Statements may affect this Document 


• This document has been reproduced from the best copy furnished by the 
organizational source. It is being released in the interest of making available as 
much information as possible. 


• This document may contain data, which exceeds the sheet parameters. It was 
furnished in this condition by the organizational source and is the best copy 
available. 


• This document may contain tone-on-tone or color graphs, charts and/or pictures, 
which have been reproduced in black and white. 


• This document is paginated as submitted by the original source. 


• Portions of this document are not fully legible due to the historical nature of some 
of the material. However, it is the best reproduction available from the original 
submission. 


Produced by the NASA Center for Aerospace Information (CASI) 



i. 


1 


i 


I 


. I- 


a. 


- department of mathematics NASACR. 

UNIVERSITY OF HOUSTON HOUSTON. TEXAS / 


(NASA -CR-IUIBBI) A CCJNT2B EXABPLE IN 
LINEAR FEATORP SELECTION THEORY (Houston 
Unif.) 7 p HC S3. 25 CSCL 12A 

G3/64 



N75-26741 

Unclas 

27272 




PREPARED FOR 

EARTH OBSERVATION DIVISION, JSC 
LINDER 

CONTkftCT NAS-9-12777 


3801 CULLEN BLVD. 
HOUSTON. TEXAS 77004 


I 


I 




» 




A CounttA- Example in LineoA FeaCu/ic StltcJUan ThevAy 


By 


Ve/utuon R. Baomi 
and 

Hattheu) J. O'Uattey 
VepaAtmenl of UathematicA 
UnivzAtity oi Hoa&ton 
hioAch, 19^5 


RepoAX *41 


> 


I 


I 


I 


I 


I 


A Counter-Exanplc In Linear FaaCura Salactlon Theory 
D.R. Brown end H.J. O'Malley 


Introduction : 

The linear feature aelection problen in multi-class pattern recognition can 
be regarded as that of linearly transforming statistical information from 
n-dimensional (real Euclidean) space into k-dimensional space, while requiring 
that average interclass divergence in the transfoimed space decrease as little 
as possible. 

Divergence, as tised in this paper, will be the expected interclass 
divergence derived from Hajek two-class divergence as defined, for example, 
in [A]. It is known [3] that there always exists a k x n matrix B such that 
the transformation determined by B siaxlmlzes the divergence in k-dimensional 
space. It is also known [3] that, if Q is any k x k invertible matrix, 
and B is as defined above, then QB again maximizes the divergence in k-space. 
The purpose of this note is to show that the converse of this result is false; 
specifically, we shall show the existence of two matrices, B^ and B 2 , each 
of .which maximizes transformed divergence, which are not related in thq fashion 
B 2 ■ QB^ for any k x k matrix Q. 

The negative resolution of this rather long standing conjecture is unfortunate 
from the computational standpoint, since derivation of matrices B which maximize 
transformed divergence is relatively inefficient. Several researchers have 
addressed the problem of obtaining such B's ([1], [2], and [6]), but the latest 
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2 . 


and Boat efficient treacmeac known to us la [3]. A conaon arror in axamlning 
apaclal jaaaa of tha problem [2] la the incorxcct aaaunption of equality 
batvr^n ikatricea of the foma ^ and ^)B^. Simple 

axamplaa (aee [5], for inatance) show thla to be false, even if all arc 

diagonal matrices. 

In the sequel, we avoid this Itfall while computing "best B's", and 
assure the maxlmality of transformed divergence by selecting covariance matrices 
and means for which it can be shown that divergence in Che transformed space 
equals divergence in the original space. Since divergence is a stonotone 
function of dimension [A], this is sufficient to establish maxlmality. While 
the choices of values are made with an eye toward computational simplicity, 
and are therexore subject to the charge of Impracticality, it should be noted 
that the existence of InequlvalenC solutions in this restricted case casts 
doubt Chat there will ever arise a situation, however practical, in which only 
a single equivalent class of solutions may be assumed a^ priori . 


SECTION 1 - Necessary Divergence Formulae . 

a 

Let n, , ..., and u, » •••» W be the covariance matrices and means 
X m X m 

for ffl classes, where, for each 1*1, ..., m, n is an n x n positive definite 

“ • 
m Y 

matrix and is a column n-vector. Let ((1^ + where 

" y^-y^. Then, assuiLing equal a priori probabilities, the average Interclass 
divergence for thi.;;c m classes is given by: 


■ \ ^ m(e-l)n. 


D 


( 1 ) 


I 

i 


i 


I 

4 


4 


3. 


while, ir B is a k X n matrix, the B-avarage intarclass divergence is: 


°B " ’«tr(^E^(Bn^B^)”^(BS^B^) - !*m(»-l)k. 


( 2 ) 


where **tr** represents the trace function* 

Next, let ^ - {B C BB^ - 1^ and (BB^)Qj - O^CBB*^), i-1, ..., n>, 

where is the k x k identity matrix, and is the set of all k x n 

real matrices. 

^ T -1 -IT 

Observe that, for any ^ ^ [1 * (Bf2^B ) • B^^ B , so that, in this case, 

(2) may be rewritten as: 


Dj - Jjt-(B(^Z^n^“^ ^)B^) - !sm(m-l)k. 


(3) 


Since ^ is closed and bounded in (regarded as E^) and D^, as a 

function from into the real numbers, is continuous, it follows that this 

function attains a maximum; that is there exists ^ such that 


Dj i Dg for all B . 

Suppose, in addition to the above restriction, that the following condition 
holds: 


-1 


.E.n. S^ is a positive definite diagonal matrix. 
1“1 1 i 


(*) 


If the diagonal entries of this matrix are denoted c, ,, ..., c , then, in this 

11 nn , 

case, the divergence reduces to: 


® “ **^ 1-1 *^ 11 ^ " 'fn(m-l)n. 


(4) 


Sufficient conditions that (*) holds are that each is a diagonal matrix 

and vi^ ■ Uj for all 1,J. 
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SECTION 2 - Ccidltion* under which D ■ 

L«t A c eetlafy the following two conditions: 

(1) Each row of A has exactly one non-zero entry and 
that entry is one; 

(2) k colusns of A have exactly one non-zero entry, 
while the reabining n-k columns have all entries equal to zero. 


Any such matrix A has the lollowing properties: 

(a) AA^ - 

T 

(b) A A ia a diagonal matrix having exactly k diagonal 
entries equal to one with the remaining diagonal entries equal 

(c) if E is a diagonal matrix, E *| I , then A£A is 

/j V V nn/ 

' 1.1 


a diagonal matrix, A£A 


•d 


1.1 




the 


diagonal elements of E. 

Furthermore, given any collection {d^ 1* **1 k^ ^ 

diagonal elements of E, then there exists a k x n matrix A satisfying 

X 

conditions (1) and (2) such that AEA is a diagonal matrix having these 
values as diagonal entries in the correct order. Although the verification 
of the above statements is tedious, it is straightforward, and we omit it. 
Nov suppose that condition (*) is satisfied, and chat 


* - • 


'll 


, a diagonal matrix. Fix k < n; by property 


nn 


(c), there exists a k x n suktrix satisfying conditions (1) and (2) for 
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T / “ 

which * diagonal matrix I *• I , whara •••* ara 

\ ‘'•wj 

tha largaac diagonal antriar of E and 2 ... 2 Tharafora 




b^j-m(m-l)k] , following fomulaa (3) and (4). Hanca 
bjj-m(m-l)kl - S D - Cj^-m(m-l)nl. 


•a 

It follows from this inequality that m(m-l)(n-k) i ‘^jj* 

2 ... k represent the remaining n-k diagonal entries of E, 

arranged in descending order. In particular, if k ■ n-x, then m(m-l) S d '. 

nn 

Thus, since d S ... $ d, it follows that D. ■ D if and only if 

nn k+lk+1 A. 

m(»-l) ‘Hc+lfc+l* 


SECTION 3 - A family of counter-examples . 

« 

To construct two k x n siatrices, both of which maximize divergence 

in the transform'id space, and which are not row equivalent, we proceed as 

follows. Let n, n be positive definite covariance matrices ivlth equal 

1 m 


means. Assuming n 2 3 and 2 ^ k < n, we require, for each 1, that 

(i) 


“l- 


»C<^> Z 

S-1 ^ 


, where is a (k-l)x(k-l) positive definite 

n-(k-l)y - 

submatrix, and Z denotes the zero submatrix of appropriate dimension.' By 

m 

direct computation. It follows that is a diagonal matrix of the 

form: 

/ 

u. 


/ 




’ll 


“k-lk-1 
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k-1 


where Ujj > C for eech J, end hence D • 


(k-l)m(m-l) ]. 


Lee end A 2 be Che following k x n aacrices: 



Clearly, both A^^ and A^ eatiefy conditions (1) and (2) of Section 2, 

and thiu>, by the derivation In that section, D ■ D* ■ D. Thus, both A. 

^ ^2 


and A 2 yield divergence which Is the same as the divergence using all n 
channels of Information, and Is therefore best possible. 

Finally, we observe that A^ and A^ are not row equivalent. Suppose, 
to the contrary, that there exists an Invertible k x h matrix Q such that 
Aj - QAj^. Then the subspace of n-space spanned by the row vectors of A^^ 
la equal to the subspace spanned by the row vectors of A^. However, 
e^^j^ - (0,... ,0,1,0,. ..,0) Is the k^ row of and clearly e^^^ Is 

not In the subspace spanned by the row vectors of A^. Therefore Aj^ and 
are not row equivalent. 


SECTION A - Conclusions . 

'In this note we have given a family of examples to show that, even under 
extremely strong conditions. It Is not possible to assume that all matrix solutions 
which maximize transformed divergence are row equivalent. 
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